Picture for Lei Xie

Lei Xie

Nanjing University

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

Add code
Jun 01, 2026
Viaarxiv icon

InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

Add code
Jun 01, 2026
Viaarxiv icon

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Add code
May 12, 2026
Viaarxiv icon

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Add code
Apr 24, 2026
Viaarxiv icon

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

Add code
Apr 23, 2026
Viaarxiv icon

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech

Add code
Apr 20, 2026
Viaarxiv icon

Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

Add code
Apr 14, 2026
Viaarxiv icon

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Add code
Apr 13, 2026
Viaarxiv icon

EvoTSE: Evolving Enrollment for Target Speaker Extraction

Add code
Apr 09, 2026
Viaarxiv icon

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Add code
Apr 07, 2026
Viaarxiv icon